System and Method for an Asynchronous Processor with Asynchronous Instruction Fetch, Decode, and Issue

ABSTRACT

Embodiments are provided for an asynchronous processor with an asynchronous Instruction fetch, decode, and issue unit. The asynchronous processor comprises an execution unit for asynchronous execution of a plurality of instructions, and a fetch, decode and issue unit configured for asynchronous decoding of the instructions. The fetch, decode and issue unit comprises a plurality of resources supporting functions of the fetch, decode and issue unit, and a plurality of decoders arranged in a predefined order for passing a plurality of tokens. The tokens control access of the decoders to the resources and allow the decoders exclusive access to the resources. The fetch, decode and issue unit also comprises an issuer unit for issuing the instructions from the decoders to the execution unit

This application claims the benefit of U.S. Provisional Application No.61/874,894 filed on Sep. 6, 2013 by Yiqun Ge et al. and entitled “Methodand Apparatus for Asynchronous Processor with Asynchronous InstructionFetch, Decode, and Issue,” which is hereby incorporated herein byreference as if reproduced in its entirety.

TECHNICAL FIELD

The present invention relates to asynchronous processing, and, inparticular embodiments, to system and method for an asynchronousprocessor with asynchronous instruction fetch, decode, and issue.

BACKGROUND

Micropipeline is a basic component for asynchronous processor design.Important building blocks of the micropipeline include the RENDEZVOUScircuit such as, for example, a chain of Muller-C elements. A Muller-Celement can allow data to be passed when the current computing logicstage is finished and the next computing logic stage is ready to start.Instead of using non-standard Muller-C elements to realize thehandshaking protocol between two clockless (without using clock timing)computing circuit logics, the asynchronous processors replicate thewhole processing block (including all computing logic stages) and use aseries of tokens and token rings to simulate the pipeline. Eachprocessing block contains a token processing logic to control the usageof tokens without time or clock synchronization between the computinglogic stages. Thus, the processor design is referred to as anasynchronous or clockless processor design. The token ring regulates theaccess to system resources. The token processing logic accepts, holds,and passes tokens between each other in a sequential manner. When atoken is held by a token processing logic, the block can be granted theexclusive access to a resource corresponding to that token, until thetoken is passed to a next token processing logic in the ring. There is aneed for an improved and more efficient asynchronous processorarchitecture which is capable of processing instructions andcomputations with less latency or delay.

SUMMARY OF THE INVENTION

In accordance with an embodiment, a method performed by an asynchronousprocessor includes receiving, at a decoder of a plurality of decoders ina token based fetch, decode, and issue unit of the asynchronousprocessor, a token enabling exclusive access to a corresponding resourcefor the token based fetch, decode and issue unit. The token is then heldat the decoder, which accesses the corresponding resource. The decoderperforms, using the corresponding resource, a function on an instructionreceived by the decoder, and upon completing the function, releases thetoken to other decoders.

In accordance with another embodiment, a method performed by a fetch,decode and issue unit in an asynchronous processor includes receiving aplurality of instructions at a plurality of corresponding decodersarranged in a predefined order. The method also includes receiving aplurality of tokens at the corresponding decoders, wherein the tokensallow the corresponding receiving decoders to exclusively access aplurality of corresponding decoding resources in the fetch, decode andissue unit and associated with the tokens. The decoders decode,independently from each other, the instructions using the correspondingdecoding resources, and upon completing the decoding using thecorresponding decoding resources, release the tokens.

In accordance with yet another embodiment, an apparatus for anasynchronous processor comprises an execution unit for asynchronousexecution of a plurality of instructions, and a fetch, decode and issueunit configured for asynchronous decoding of the instructions. Thefetch, decode and issue unit comprises a plurality of resourcessupporting functions of the fetch, decode and issue unit, and aplurality of decoders arranged in a predefined order for passing aplurality of tokens. The tokens control access of the decoders to theresources and allow the decoders exclusive access to the resources. Thefetch, decode and issue unit also comprises an issuer unit for issuingthe instructions from the decoders to the execution unit.

The foregoing has outlined rather broadly the features of an embodimentof the present invention in order that the detailed description of theinvention that follows may be better understood. Additional features andadvantages of embodiments of the invention will be describedhereinafter, which form the subject of the claims of the invention. Itshould be appreciated by those skilled in the art that the conceptionand specific embodiments disclosed may be readily utilized as a basisfor modifying or designing other structures or processes for carryingout the same purposes of the present invention. It should also berealized by those skilled in the art that such equivalent constructionsdo not depart from the spirit and scope of the invention as set forth inthe appended claims.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention, and theadvantages thereof, reference is now made to the following descriptionstaken in conjunction with the accompanying drawing, in which:

FIG. 1 illustrates a Sutherland asynchronous micropipeline architecture;

FIG. 2 illustrates a token ring architecture;

FIG. 3 illustrates an asynchronous processor architecture;

FIG. 4 illustrates token based pipelining with gating within anarithmetic and logic unit (ALU);

FIG. 5 illustrates token based pipelining with passing between ALUs;

FIG. 6 illustrates a synchronous fetch, decoding, and issue unit;

FIG. 7 illustrates an embodiment of a token based fetch, decode, andissue unit architecture; and

FIG. 8 illustrates an embodiment of a token gating system for a tokenbased fetch, decode, and issue unit;

FIG. 9 illustrates an embodiment of a token passing system for a tokenbased fetch, decode, and issue unit;

FIG. 10 illustrates an embodiment of a method applying a token basedfetch, decode, and issue unit.

Corresponding numerals and symbols in the different figures generallyrefer to corresponding parts unless otherwise indicated. The figures aredrawn to clearly illustrate the relevant aspects of the embodiments andare not necessarily drawn to scale.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

The making and using of the presently preferred embodiments arediscussed in detail below. It should be appreciated, however, that thepresent invention provides many applicable inventive concepts that canbe embodied in a wide variety of specific contexts. The specificembodiments discussed are merely illustrative of specific ways to makeand use the invention, and do not limit the scope of the invention.

FIG. 1 illustrates a Sutherland asynchronous micropipeline architecture.The Sutherland asynchronous micropipeline architecture is one form ofasynchronous micropipeline architecture that uses a handshaking protocolto operate the micropipeline building blocks. The Sutherlandasynchronous micropipeline architecture includes a plurality ofcomputing logics linked in sequence via flip-flops or latches. Thecomputing logics are arranged in series and separated by the latchesbetween each two adjacent computing logics. The handshaking protocol isrealized by Muller-C elements (labeled C) to control the latches andthus determine whether and when to pass information between thecomputing logics. This allows for an asynchronous or clockless controlof the pipeline without the need for timing signal. A Muller-C elementhas an output coupled to a respective latch and two inputs coupled totwo other adjacent Muller-C elements, as shown. Each signal has one oftwo states (e.g., 1 and 0, or true and false). The input signals to theMuller-C elements are indicated by A(i), A(i+1), A(i+2), A(i+3) for thebackward direction and R(i), R(i+1), R(i+2), R(i+3) for the forwarddirection, where i, i+1, i+2, i+3 indicate the respective stages in theseries. The inputs in the forward direction to Muller-C elements aredelayed signals, via delay logic stages The Muller-C element also has amemory that stores the state of its previous output signal to therespective latch. A Muller-C element sends the next output signalaccording to the input signals and the previous output signal.Specifically, if the two input signals, R and A, to the Muller-C elementhave different state, then the Muller-C element outputs A to therespective latch. Otherwise, the previous output state is held. Thelatch passes the signals between the two adjacent computing logicsaccording to the output signal of the respective Muller-C element. Thelatch has a memory of the last output signal state. If there is statechange in the current output signal to the latch, then the latch allowsthe information (e.g., one or more processed bits) to pass from thepreceding computing logic to the next logic. If there is no change inthe state, then the latch blocks the information from passing. ThisMuller-C element is a non-standard chip component that is not typicallysupported in function libraries provided by manufacturers for supportingvarious chip components and logics. Therefore, implementing on a chipthe function of the architecture above based on the non-standardMuller-C elements is challenging and not desirable.

FIG. 2 illustrates an example of a token ring architecture which is asuitable alternative to the architecture above in terms of chipimplementation. The components of this architecture are supported bystandard function libraries for chip implementation. As described above,the Sutherland asynchronous micropipeline architecture requires thehandshaking protocol, which is realized by the non-standard Muller-Celements. In order to avoid using Muller-C elements (as in FIG. 1), aseries of token processing logics are used to control the processing ofdifferent computing logics (not shown), such as processing units on achip (e.g., ALUs) or other functional calculation units, or the accessof the computing logics to system resources, such as registers ormemory. To cover the long latency of some computing logics, the tokenprocessing logic is replicated to several copies and arranged in aseries of token processing logics, as shown. Each token processing logicin the series controls the passing of one or more token signals(associated with one or more resources). A token signal passing throughthe token processing logics in series forms a token ring. The token ringregulates the access of the computing logics (not shown) to the systemresource (e.g., memory, register) associated with that token signal. Thetoken processing logics accept, hold, and pass the token signal betweeneach other in a sequential manner. When a token signal is held by atoken processing logic, the computing logic associated with that tokenprocessing logic is granted the exclusive access to the resourcecorresponding to that token signal, until the token signal is passed toa next token processing logic in the ring. Holding and passing the tokensignal concludes the logic's access or use of the correspondingresource, and is referred to herein as consuming the token. Once thetoken is consumed, it is released by this logic to a subsequent logic inthe ring.

FIG. 3 illustrates an asynchronous processor architecture. Thearchitecture includes a plurality of self-timed (asynchronous)arithmetic and logic units (ALUs) coupled in parallel in a token ringarchitecture as described above. The ALUs can comprise or correspond tothe token processing logics of FIG. 2. The asynchronous processorarchitecture of FIG. 3 also includes a feedback engine for properlydistributing incoming instructions between the ALUs, aninstruction/timing history table accessible by the feedback engine fordetermining the distribution of instructions, a register (memory)accessible by the ALUs, and a crossbar for exchanging needed informationbetween the ALUs. The table is used for indicating timing and dependencyinformation between multiple input instructions to the processor system.The instructions from the instruction cache/memory go through thefeedback engine which detects or calculates the data dependencies anddetermines the timing for instructions using the history table. Thefeedback engine pre-decodes each instruction to decide how many inputoperands this instruction requires. The feedback engine then looks upthe history table to find whether this piece of data is on the crossbaror on the register file. If the data is found on the crossbar bus, thefeedback engine calculates which ALU produces the data. This informationis tagged to the instruction dispatched to the ALUs. The feedback enginealso updates accordingly the history table.

FIG. 4 illustrates token based pipelining with gating within an ALU,also referred to herein as token based pipelining for an intra-ALU tokengating system. According to this pipelining, designated tokens are usedto gate other designated tokens in a given order of the pipeline. Thismeans when a designated token passes through an ALU, a second designatedtoken is then allowed to be processed and passed by the same ALU in thetoken ring architecture. In other words, releasing one token by the ALUbecomes a condition to consume (process) another token in that ALU inthat given order. FIG. 4 illustrates one possible example oftoken-gating relationship. The tokens used include a launch token (L), aregister access token®, a jump token (PC), a memory access token (M), aninstruction pre-fetch token (F), optionally other resource tokens, and acommit token (W). Consuming (processing) the L token enables the ALU tostart and decode an instruction. Consuming the R token enables the ALUto read values from a register file. Consuming the PC token enables theALU to decide whether a jump to another instruction is needed inaccordance with a program counter (PC). Consuming the M token enablesthe ALU to access a memory that caches instructions. Consuming the Ftoken enables the ALU to fetch the next instruction from memory.Consuming other resources tokens enables the ALU to use or access suchresources. Consuming the W token enables the ALU to write or commit theprocessing and calculation results for instructions to the memory.Specifically, in this example, the launch token (L) gates the registeraccess token (R), which in turn gates the jump token (PC token). Thejump token gates the memory access token (M), the instruction pre-fetchtoken (F), and possibly other resource tokens that may be used. Thismeans that tokens M, F, and other resource tokens can only be consumedby the ALU after passing the jump token. These tokens gate the committoken (W) to register or memory. The commit token is also referred toherein as a token for writing the instruction. The commit token in turngates the lunch token. The gating signal from the gating token (a tokenin the pipeline) is used as input into a consumption condition logic ofthe gated token (the token in the next order of the pipeline). Forexample, the launch-token (L) generates an active signal to the registeraccess or read token (R), when L is released to the next ALU. Thisguarantees that any ALU would not read the register file until aninstruction is actually started by the launch-token.

FIG. 5 illustrates token based pipelining with passing between ALUs,also referred to herein as token based pipelining for an inter-ALU tokenpassing system. According to this pipelining, a consumed token signalcan trigger a pulse to a common resource. For example, theregister-access token (R) triggers a pulse to the register file. Thetoken signal is delayed before it is released to the next ALU for such aperiod, preventing a structural hazard on this common resource (theregister file) between ALU-(n) and ALU-(n+1). The tokens preservemultiple ALUs from launching and committing (or writing) instructions inthe program counter order, and also avoid structural hazard among themultiple ALUs.

FIG. 6 illustrates a synchronous fetch, decoding, and issue unit, whichis typically used in an asynchronous processor architecture. A typicalfetch/decode/issue unit comprises a fetch function or logic, a decodefunction, and an issue function. The functions can be implemented bysuitable circuit logic. The fetch function fetches the instructions fromcache/memory, performs branch/jump predication, stacks the returninstruction addresses, and calculates and checks the effectiveinstruction addresses. The decode function decodes the instructions,processes change-of-flow (COF) reports for the instructions, buffers theinstructions, and scoreboards the instructions. The issue functionremaps the operands of the instructions, and dispatches the instructionsto the ALUs. The synchronous fetch, decoding, and issue unit of FIG. 6corresponds to the feedback engine in FIG. 3. The synchronous fetch,decoding, and issue unit distributes and sends the instructions to theALUs of the asynchronous processor. The ALUs are arranged in a tokenring architecture as shown in FIG. 3.

In the above asynchronous design of the fetch/decode/issue unit, thenumber of fetch/decode/issue stages occupies a substantial portion of atotal length of the instruction processing pipeline in the asynchronousprocessor. The pipeline can even become longer for some processordesigns, which increases delays such as the pipeline flush penalty incase of prediction and decision branching. It is desirable that thepipeline be easily expandable. For example, many operations are expectedto be done at this stage. Further, newer operations may be added.

The system and method embodiments herein are described in the context ofan ALU set in the asynchronous processor. The ALUs serve as instructionprocessing units that perform calculations and provide results for thecorresponding issued instructions. However in other embodiments, theprocessor may comprise other instruction processing units instead of theALUs. The instruction units may be referred to sometimes as executionunits (XUs) or execution logics, and may have similar, different oradditional functions for handling instructions than the ALUs describedabove. In general, the system and method embodiments described hereincan apply to any instruction execution or processing units that operate,in an asynchronous processor architecture, using a token based fetch,decode, and issue unit and its token gating and passing systemsdescribed below.

FIG. 7 illustrates an embodiment of a token based fetch, decode, andissue unit architecture that overcomes the disadvantages of the typicalfetch, decode, and issue unit and meets the requirements above.Specifically, the architecture establishes an asynchronousfetch/decode/issue unit by a token system, where different resources areaccessed and controlled in an asynchronous manner to handle multipleinstructions at about the same time using the token system. Thearchitecture includes a plurality of decoders (decoder-0 to decoder-N)that decodes instructions asynchronously (separately or substantially inan independent manner). The incoming instructions can be queued beforesending the instructions to the appropriate decoders. The architecturealso includes a plurality of processing resources that can be accessedby the decoders for supporting the handling and decoding of theinstructions. The resources may include a branch prediction table (BTB),a return address stack (RAS), a registry window, a bookkeep/scoreboard,loop predicators, an instruction queue buffer, an issuer for issuing thedecoded instructions properly to corresponding ALUs or any suitable typeof XUs, a program counter (PC) for controlling instructions jumps,according to COF information from the execution unit, and optionallyother resources. The description of the functionalities of the decodersand their resources are shown in Table 1 below. The functions can beimplemented by any suitable circuit logic.

TABLE 1 Resources of the token based fetch, decode, and issue unitFunctionality Description Decoder Early decode instruction to decide thetype of instruction (jump, call, return, other) BTBBranch-predication-table, e.g., bimodal predictor, global-history-table-based predictor, or other prediction algorithms RASReturn-address-stack, when entry into a function, stack in the PCaddress; when return from a function, stack out the PC address Registerwindow When entry or return a function, update the register window; forother instructions de-map (remove mapping of) the operandsBookkeep/scoreboard Detect data hazard and calculate data dependency,log the data dependency information, decide if an instruction is readyfor issue (scoreboard) Loop predicators If the loop counter is given byan immediate value of an instruction, predict loops, support nestedloops Instruction queue Every issued instruction is registered into thisbuffer buffer Issuer Issue the instruction to the execution unit (theset of XUs, e.g., ALUs); can actively push the instructions or passivelywait for a request PC Monitor the COF requests from the execution unit;the request can be a branch PC jump or an exception/interruption OthersAny other functionalities at the fetch/decode/issue stages, e.g.,address generation unit (AGU), access to address register, or access tospecial register

The decoders' exclusive access to the various resources is controlledusing a token system. Specifically, a decoder is granted the exclusiveaccess to a resource by holding and then releasing that token to anotherdecoder. The tokens are gated and passed by the decoders according to adefined token pipelining (defined order of tokens). FIG. 8 illustratesan embodiment of a token gating system for the token based fetch,decode, and issue unit, in the asynchronous processor architecture. Thisintra-decoder token gating system can form a cascade of the instructionfetch, decode, and issue stages. The token gating follows a similarprinciple as that described for the token based pipelining with gatingin FIG. 4. Specifically, in FIG. 8, designated tokens are used to gateother designated tokens in a given order of the pipeline. This meanswhen a designated token passes through a decoder of the fetch, decode,and issue unit, a second designated token is then allowed to beprocessed and passed by the same decoder. In other words, releasing onetoken by the decoder becomes a condition to consume (process) anothertoken in that decoder in that given order. The tokens can be passedaccording to the order of the arrangement of the decoders (a definedorder) in the fetch, decode and issue unit. In an embodiment, thedecoders are arranged in a ring architecture similar to that of the ALUsin FIG. 3. FIG. 8 illustrates one possible example of token-gatingrelationship. The tokens used include a fetch and decode token, a RAStoken, A BTB token, a loop predication token, a bookkeep token, aregister (Reg) token, one or more other resources (others) tokens, a PCtoken, an issuer token, and an instruction-queue buffer token.

Consuming (processing) the fetch and decode token enables the decoder tofetch and decode an instruction. Consuming the RAS, BTB, looppredication, bookkeep, register window, and other resource token(s)enables the decoder to exclusively access such resources without theother decoders. Consuming the PC token enables the decoder to decidewhether a jump to another instruction is needed in accordance with aprogram counter (PC). Consuming the issuer token enables the decoder tosend the instruction to the issuer which then issues the instruction toan XU. Consuming the instruction-queue buffer token enables the decoderto access the instruction-queue buffer. Specifically, in thisembodiment, the fetch and decode token gates the RAS, BTB, looppredication, bookkeep, register window, and other resource token(s).These resource tokens gate, in turn, the PC token. The PC token gatesthe issuer token and the instruction-queue buffer token, which both gatethe fetch and decode token. For example, the fetch and decode tokengenerates an active signal to the register window token, when the fetchand decode token is released to another decoder. This guarantees thatany decoder would not update the register window until an instruction isactually fetched and decoded.

The based fetch, decode, and issue unit architecture and its tokengating system above is one embodiment or example of implementation. Apractical realization may be different but follows a similar principleto a token based system. For instance, in practical cases where thereare other function(s) to be executed at this stage, aresource/functional block is inserted to this architecture. A token iscreated to indicate the decoder's exclusive access to the addedresource/functional block. The token is integrated into the token-system(gate a pass) as described above.

FIG. 9 illustrates an embodiment of a token passing system for a tokenbased fetch, decode, and issue unit. The system can be implementedbetween the multiple decoders in the asynchronous (token based) fetch,decode and issue unit. This inter-decoder token passing system preservesthe program counter (PC) order, and avoids the structural hazard, e.g.,resource conflicts among multiple decoders.

According to this pipelining system, a consumed token signal can triggera pulse to a common resource for the decoders. For example, the PC tokentriggers the monitoring of the COF requests (e.g., branch PC jump or anexception/interruption requests) from the execution unit. The tokensignal is delayed before it is released to the next decoder for such aperiod, preventing a structural hazard on this common resource betweenDecoder-n and Decoder-n+1. The tokens ensure that multiple decoders todecode and issue instructions in the program counter order, and alsoavoid structural hazard among the multiple decoders.

FIG. 10 illustrates an embodiment of a method applying an asynchronous(token based) fetch, decode, and issue unit architecture. At step 1010,a decoder of a plurality of decoders in a token based fetch, decode, andissue unit of the processor receives a token enabling exclusive accessto one or a plurality of resources for the fetch, decode and issue unit.For instance, the token is one of the tokens of the token based fetch,decode, and issue unit architecture described above. At step 1020, thedecoder holds the token and accesses (exclusively without the otherdecoders) the corresponding resource to perform a related function on aninstruction received by the decoder. At step 1030, upon completing thefunction, the decoder releases the token to the other decoders of thefetch, decode and issue unit. At step 1040, if the consumed token at thedecoder was an issuer token, the instruction is issued, e.g., by anissuer logic, to an XU or ALU. The method enables the decoders tooperate on and decode the instructions in an asynchronous manner. Forexample, multiple decoders can fetch multiple instructions but accessdifferent resources at the same time period.

While several embodiments have been provided in the present disclosure,it should be understood that the disclosed systems and methods might beembodied in many other specific forms without departing from the spiritor scope of the present disclosure. The present examples are to beconsidered as illustrative and not restrictive, and the intention is notto be limited to the details given herein. For example, the variouselements or components may be combined or integrated in another systemor certain features may be omitted, or not implemented.

In addition, techniques, systems, subsystems, and methods described andillustrated in the various embodiments as discrete or separate may becombined or integrated with other systems, modules, techniques, ormethods without departing from the scope of the present disclosure.Other items shown or discussed as coupled or directly coupled orcommunicating with each other may be indirectly coupled or communicatingthrough some interface, device, or intermediate component whetherelectrically, mechanically, or otherwise. Other examples of changes,substitutions, and alterations are ascertainable by one skilled in theart and could be made without departing from the spirit and scopedisclosed herein.

What is claimed is:
 1. A method performed by an asynchronous processor,the method comprising: receiving, at a decoder in a plurality ofdecoders in a token based fetch, decode, and issue unit of theasynchronous processor, a token enabling exclusive access to acorresponding resource for the token based fetch, decode and issue unit;holding the token at the decoder; accessing the corresponding resource;performing, using the corresponding resource, a function on aninstruction received by the decoder; and upon completing the function,releasing, at the decoder, the token to other decoders.
 2. The method ofclaim 1, wherein the corresponding resource is accessed exclusively bythe decoder without the other decoders, until the releasing of the tokenby the decoder.
 3. The method of claim 1, wherein the token is an issuertoken for issuing the instruction from the token based fetch, decode andissue unit to an execution unit of the asynchronous processor, andwherein the method further comprises issuing the instruction to theexecution unit.
 4. The method of claim 1 further comprising: afterreleasing the token, receiving at the decoder a second token enablingexclusive access to a second resource for the token based fetch, decodean disuse unit; holding the second token at the decoder; and accessingthe second resource; performing, using the second resource, a secondfunction on the instruction or a second instruction received by thedecoder; and upon completing the second function, releasing, at thedecoder, the second token to other decoders.
 5. The method of claim 1,wherein the token is one of a plurality of tokens received by thedecoders for accessing corresponding resources in accordance with apredefined order of token pipelining and token-gating relationship. 6.The method of claim 5 further comprising passing, in accordance with thepredefined order of token pipelining and token-gating relationship, thetokens from the decoder to a next decoder in an arranged order of thedecoders in the token based fetch, decode and issue unit.
 7. The methodof claim 5, wherein the resources include at least one of a returnaddress stack (RAS), a branch prediction table (BTB), a registry window,a bookkeep or scoreboard, a loop predicator, an instruction-queuebuffer, an issuer for issuing instructions to an execution unit, and aprogram counter (PC) unit for deciding whether a jump for handling aninstruction is needed in accordance with a PC.
 8. The method of claim 7,wherein, in accordance with the predefined order of token pipelining andtoken-gating relationship, releasing a token for fetching a decoding aninstruction is a condition to receive resource tokens for accessing andusing the RAS, the BTB, the registry window, the bookkeep or scoreboard,the loop predicator, wherein releasing the resource tokens is acondition to receive a token for PC jumps, and wherein releasing thetoken for PC jumps is a condition to receive a token for using theinstruction and a token for accessing and using and instruction-queuebuffer.
 9. A method performed by a fetch, decode and issue unit in anasynchronous processor, the method comprising: receiving a plurality ofinstructions at a plurality of corresponding decoders arranged in apredefined order; receiving a plurality of tokens at the correspondingdecoders, wherein the tokens allow the corresponding receiving decodersto exclusively access a plurality of corresponding decoding resources inthe fetch, decode and issue unit and associated with the tokens;decoding, at the decoders independently from each other, theinstructions using the corresponding decoding resources; and uponcompleting the decoding using the corresponding decoding resources,releasing the tokens at the decoders.
 10. The method of claim 9, whereinthe released tokens are available to be received and used by the otherdecoders to exclusively access the corresponding decoding resourcesassociated with the tokens.
 11. The method of claim 9, wherein thetokens are received in accordance with a predefined order of tokenpipelining and token-gating relationship.
 12. The method of claim 11further comprising passing, in accordance with the predefined order oftoken pipelining and token-gating relationship, the tokens between thedecoders in an arranged order of the decoders.
 13. The method of claim9, wherein the decoding resources include at least one of a returnaddress stack (RAS), a branch prediction table (BTB), a registry window,a bookkeep or scoreboard, a loop predicator, an instruction-queuebuffer, an issuer for issuing instructions to an execution unit, and aprogram counter (PC) unit for deciding whether a jump for handling aninstruction is needed in accordance with a PC.
 14. An apparatus for anasynchronous processor comprising: an execution unit for asynchronousexecution of a plurality of instructions; and a fetch, decode and issueunit configured for asynchronous decoding of the instructions andcomprising: a plurality of resources supporting functions of the fetch,decode and issue unit; a plurality of decoders arranged in a predefinedorder for passing a plurality of tokens, wherein the tokens controlaccess of the decoders to the resources and allow the decoders exclusiveaccess to the resources; and an issuer unit for issuing the instructionsfrom the decoders to the execution unit.
 15. The apparatus of claim 14,wherein fetch decode and issue unit further comprises a program counter(PC) unit configured to decide whether a jump for handling a newinstruction is needed in accordance with a program counter (PC) andfurther in accordance with change-of-flow (COF) information from theexecution unit.
 16. The apparatus of claim 15, wherein resources includeat least one of a return address stack (RAS), a branch prediction table(BTB), a registry window, a bookkeep or scoreboard, a loop predicator,and an instruction-queue buffer.
 17. The apparatus of claim 16, whereinthe decoders are further configured to receive the tokens in accordancewith a predefined order of token pipelining and token-gatingrelationship.
 18. The apparatus of claim 17, wherein, in accordance withthe predefined order of token pipelining and token-gating relationship,releasing a token for fetching a decoding an instruction is a conditionto receive resource tokens for accessing and using the RAS, the BTB, theregistry window, the bookkeep or scoreboard, the loop predicator,wherein releasing the resource tokens is a condition to receive a tokenfor PC jumps, and wherein releasing the token for PC jumps is acondition to receive a token for using the instruction and a token foraccessing and using and instruction-queue buffer.
 19. The apparatus ofclaim 14, wherein the execution unit comprises a plurality of arithmeticand logic units (ALUs) arranged in a ring architecture for passing aplurality of second tokens, and wherein the second tokens control accessof the ALUs to a plurality of corresponding second resources for theexecution unit.
 20. The apparatus of claim 14, wherein the resources,decoders, and the issuer are configured via circuit logic.